[opt](cloud) optimize load performance for inverted index when pack small files#59011
[opt](cloud) optimize load performance for inverted index when pack small files#59011liaoxin01 merged 4 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the load performance for inverted indexes when packing small files by introducing non-blocking file writer close operations and adding explicit file writer cleanup logic.
Key Changes
- Changed file writer close operation to non-blocking mode (
close(true)) in FSIndexOutputV2 to improve performance - Added explicit cleanup loop in SegmentFlusher::close() to ensure all index file writers' underlying file writers are properly closed
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| be/src/olap/rowset/segment_v2/inverted_index_fs_directory.cpp | Modified FSIndexOutputV2::close() to use non-blocking close (close(true)) for the underlying file writer to improve performance |
| be/src/olap/rowset/segment_creator.cpp | Added explicit loop to close underlying file writers after closing the index file collection, ensuring proper resource cleanup even if errors occur during the close chain |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
TPC-H: Total hot run time: 36035 ms |
TPC-DS: Total hot run time: 178491 ms |
ClickBench: Total hot run time: 27.31 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1edde83 to
2e5391a
Compare
|
run buildall |
TPC-H: Total hot run time: 36456 ms |
TPC-DS: Total hot run time: 178672 ms |
ClickBench: Total hot run time: 27.3 s |
|
run buildall |
TPC-H: Total hot run time: 34973 ms |
TPC-DS: Total hot run time: 177439 ms |
ClickBench: Total hot run time: 27.13 s |
|
run buildall |
TPC-H: Total hot run time: 35195 ms |
TPC-DS: Total hot run time: 178442 ms |
ClickBench: Total hot run time: 27.95 s |
2e5391a to
05424bc
Compare
|
run buildall |
TPC-H: Total hot run time: 36514 ms |
TPC-DS: Total hot run time: 178401 ms |
ClickBench: Total hot run time: 27.32 s |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
fa17d90 to
2023b08
Compare
|
run buildall |
TPC-H: Total hot run time: 36545 ms |
TPC-DS: Total hot run time: 178672 ms |
ClickBench: Total hot run time: 27.11 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 35195 ms |
TPC-DS: Total hot run time: 179329 ms |
ClickBench: Total hot run time: 27.45 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
…mall files (apache#59011) Related PR: apache#57770 Problem Summary: When merging small files with inverted indexes, the segment close operation was synchronously waiting for inverted index files to be uploaded to S3. This blocking behavior significantly impacted the memtable flush thread performance, causing bottlenecks in the data loading pipeline. Solution: The solution introduces a two-phase close mechanism for inverted index file writers: 1. **Asynchronous Close Phase**: During segment close, inverted index files are closed asynchronously and the S3 upload task is submitted immediately without waiting for completion. 2. **Wait Phase**: When the load channel closes, the system waits for all pending S3 upload tasks to complete, ensuring data consistency.
…mall files (apache#59011) Related PR: apache#57770 Problem Summary: When merging small files with inverted indexes, the segment close operation was synchronously waiting for inverted index files to be uploaded to S3. This blocking behavior significantly impacted the memtable flush thread performance, causing bottlenecks in the data loading pipeline. Solution: The solution introduces a two-phase close mechanism for inverted index file writers: 1. **Asynchronous Close Phase**: During segment close, inverted index files are closed asynchronously and the S3 upload task is submitted immediately without waiting for completion. 2. **Wait Phase**: When the load channel closes, the system waits for all pending S3 upload tasks to complete, ensuring data consistency.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #57770
Problem Summary:
When merging small files with inverted indexes, the segment close operation was synchronously waiting for inverted index files to be uploaded to S3. This blocking behavior significantly impacted the memtable flush thread performance, causing bottlenecks in the data loading pipeline.
Solution:
The solution introduces a two-phase close mechanism for inverted index file writers:
Asynchronous Close Phase: During segment close, inverted index files are closed asynchronously and the S3 upload task is submitted immediately without waiting for completion.
Wait Phase: When the load channel closes, the system waits for all pending S3 upload tasks to complete, ensuring data consistency.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)